--- title: "Using a delay-adjusted case fatality ratio to estimate under-reporting" description: "Using a corrected case fatality ratio, we calculate estimates of the level of under-reporting for any country with greater than ten deaths" status: real-time-report rmarkdown_html_fragment: true update: 2020-06-19 authors: - id: tim_russell corresponding: true - id: joel_hellewell equal: 1 - id: sam_abbott equal: 1 - id: nick_golding - id: hamish_gibbs - id: chris_jarvis - id: kevin_vanzandvoort - id: ncov-group - id: stefan_flasche - id: roz_eggo - id: john_edmunds - id: adam_kucharski ---

Aim

To estimate the percentage of symptomatic COVID-19 cases reported in different countries using case fatality ratio estimates based on data from the ECDC, correcting for delays between confirmation-and-death.

Methods Summary

Current estimates for percentage of symptomatic cases reported for countries with greater than ten deaths

Temporal variation

Figure 1: Temporal variation in reporting rate. We calculate the percentage of symptomatic cases reported on each day a country has had more than ten deaths. We then fit a Gaussian Process (GP) to these data (see Temporal variation model fitting section for details), highlighting the temporal trend of each countries reporting rate. The red shaded region is the 95% CrI of fitted GP.

Adjusted symptomatic case estimates

Figure 2: Estimated number of new symptomatic cases, calculated using our temporal under-reporting estimates. We adjust the reported case numbers each day - for each country with an under-reporting estimate - using our temporal under-reporting estimates to arrive at an estimate of the true number of symptomatic cases each day. The shaded blue region represents the 95% CrI, calcuated directly using the 95% CrI of the temporal under-reporting estimate.

Reported cases

Figure 3: Reported number of cases each day, pulled from the ECDC and plotted against time for comparison with our estimated true numbers of symptomatic cases each day, adjusted using our under-reporting estimates.

Table of current estimates

Country Percentage of symptomatic cases reported (95% CI) Total cases Total deaths
Afghanistan 59% (43%-79%) 26,874 504
Albania 89% (49%-100%) 1,722 38
Algeria 20% (14%-28%) 11,268 799
Andorra 37% (18%-82%) 854 52
Argentina 52% (40%-65%) 35,539 913
Armenia 55% (42%-69%) 18,033 302
Australia 91% (67%-100%) 7,370 102
Austria 84% (43%-100%) 17,115 687
Azerbaijan 82% (61%-100%) 10,991 133
Bahrain 97% (87%-100%) 19,961 49
Bangladesh 85% (68%-100%) 98,489 1,305
Belarus 99% (87%-100%) 56,032 324
Belgium 38% (25%-52%) 60,244 9,675
Bolivia 34% (26%-42%) 20,685 679
Bosnia and Herzegovina 59% (26%-96%) 3,141 168
Brazil 37% (30%-44%) 955,377 46,510
Bulgaria 22% (16%-31%) 3,542 184
Burkina Faso 59% (26%-97%) 899 53
Cameroon 32% (22%-46%) 9,864 276
Canada 25% (20%-30%) 99,842 8,254
Chad 60% (21%-96%) 854 74
Chile 59% (52%-66%) 220,628 3,615
China 38% (16%-67%) 84,458 4,638
Colombia 28% (22%-34%) 57,046 1,864
Congo 43% (23%-75%) 883 27
Costa Rica 89% (62%-100%) 1,871 12
Cote dIvoire 93% (75%-100%) 6,063 48
Croatia 21% (8.5%-41%) 2,258 107
Cuba 82% (49%-100%) 2,280 84
Cyprus 84% (46%-100%) 985 18
Czechia 76% (48%-99%) 10,162 333
Democratic Republic of the Congo 61% (29%-92%) 5,099 115
Denmark 59% (36%-83%) 12,294 598
Djibouti 90% (68%-100%) 4,545 43
Dominican Republic 64% (37%-95%) 24,105 633
Ecuador 17% (14%-21%) 48,490 4,007
Egypt 26% (21%-32%) 49,219 1,850
El Salvador 56% (39%-78%) 4,066 82
Equatorial Guinea 91% (67%-100%) 1,306 12
Estonia 43% (25%-76%) 1,977 69
Ethiopia 38% (27%-53%) 3,759 63
Finland 81% (48%-100%) 7,117 326
France 46% (36%-54%) 158,174 29,575
Gabon 95% (78%-100%) 4,229 30
Georgia 77% (40%-100%) 893 14
Germany 47% (36%-60%) 187,764 8,856
Ghana 98% (88%-100%) 12,590 66
Greece 26% (16%-41%) 3,203 187
Guatemala 29% (21%-39%) 11,251 432
Guinea 96% (83%-100%) 4,668 26
Guinea Bissau 58% (25%-98%) 1,492 15
Guyana 55% (17%-99%) 171 12
Haiti 83% (52%-100%) 4,688 82
Honduras 38% (27%-51%) 10,299 336
Hungary 13% (8.4%-18%) 4,079 568
Iceland 86% (48%-100%) 1,815 10
India 33% (27%-39%) 366,946 12,237
Indonesia 22% (17%-27%) 41,431 2,276
Iran 38% (31%-46%) 195,051 9,185
Iraq 23% (18%-29%) 24,254 773
Ireland 26% (17%-40%) 25,341 1,710
Israel 85% (66%-99%) 19,894 303
Italy 12% (9.6%-14%) 237,828 34,448
Jamaica 64% (19%-100%) 626 10
Japan 44% (28%-63%) 17,668 935
Jersey 20% (7.3%-58%) 318 30
Kazakhstan 95% (62%-100%) 15,877 100
Kenya 52% (34%-75%) 4,044 107
Kosovo 72% (41%-100%) 1,486 33
Kuwait 97% (89%-100%) 37,533 306
Kyrgyzstan 85% (57%-100%) 2,657 31
Latvia 40% (17%-72%) 1,104 30
Lebanon 78% (43%-100%) 1,489 32
Liberia 27% (12%-60%) 509 33
Lithuania 28% (14%-44%) 1,778 76
Luxembourg 56% (36%-84%) 4,085 110
Malaysia 96% (73%-100%) 8,515 121
Mali 23% (16%-35%) 1,890 107
Mauritania 14% (9.9%-19%) 2,057 93
Mexico 12% (9.7%-14%) 159,793 19,080
Moldova 33% (25%-41%) 12,732 433
Morocco 97% (79%-100%) 8,997 213
Nepal 96% (85%-100%) 7,177 20
Netherlands 48% (34%-65%) 49,204 6,074
Nicaragua 93% (59%-100%) 2,014 64
Niger 31% (13%-76%) 1,020 67
Nigeria 54% (41%-70%) 17,735 469
North Macedonia 22% (16%-29%) 4,482 210
Norway 81% (41%-100%) 8,660 243
Oman 98% (90%-100%) 26,079 116
Pakistan 57% (46%-68%) 160,118 3,093
Panama 73% (45%-100%) 22,597 470
Paraguay 93% (71%-100%) 1,308 13
Peru 35% (28%-42%) 240,908 7,257
Philippines 70% (52%-88%) 27,238 1,108
Poland 47% (35%-60%) 30,701 1,286
Portugal 62% (45%-80%) 37,672 1,523
Puerto Rico 93% (77%-100%) 6,003 147
Qatar 87% (40%-100%) 83,174 82
Romania 20% (14%-28%) 22,760 1,451
Russia 68% (57%-79%) 553,301 7,478
San Marino 84% (32%-100%) 696 42
Sao Tome and Principe 83% (49%-100%) 683 12
Saudi Arabia 51% (41%-63%) 141,234 1,091
Senegal 85% (61%-100%) 5,369 73
Serbia 94% (69%-100%) 12,522 257
Sierra Leone 72% (40%-99%) 1,249 51
Singapore 88% (50%-100%) 41,216 26
Slovakia 75% (43%-100%) 1,561 28
Slovenia 21% (11%-44%) 1,513 109
Somalia 70% (42%-96%) 2,696 88
South Africa 39% (31%-46%) 80,412 1,674
South Korea 82% (52%-100%) 12,257 280
South Sudan 59% (32%-93%) 1,807 31
Sri Lanka 93% (74%-100%) 1,924 11
Sudan 20% (14%-28%) 8,020 487
Sweden 50% (40%-60%) 54,562 5,041
Switzerland 35% (24%-50%) 31,100 1,677
Tajikistan 99% (89%-100%) 5,221 51
Thailand 79% (48%-100%) 3,141 58
Togo 83% (50%-100%) 544 13
Tunisia 54% (20%-99%) 1,128 50
Turkey 85% (69%-97%) 182,727 4,861
Ukraine 48% (35%-63%) 33,234 943
United Arab Emirates 98% (85%-100%) 43,364 295
United Kingdom 19% (15%-22%) 299,251 42,153
United Republic of Tanzania 60% (28%-98%) 509 21
United States of America 45% (37%-53%) 2,163,290 117,717
Uruguay 55% (27%-96%) 849 24
Uzbekistan 97% (85%-100%) 5,697 19
Venezuela 94% (76%-100%) 3,150 27
Yemen 1.1% (0.78%-1.6%) 902 244

Table 1: Estimates for the proportion of symptomatic cases reported in different countries using cCFR estimates based on case and death timeseries data from the ECDC. Total cases and deaths in each country is also shown. Confidence intervals calculated using an exact binomial test with 95% significance.

Adjusting for outcome delay in CFR estimates

During an outbreak, the naive CFR (nCFR), i.e. the ratio of reported deaths date to reported cases to date, will underestimate the true CFR because the outcome (recovery or death) is not known for all cases [5]. We can therefore estimate the true denominator for the CFR (i.e. the number of cases with known outcomes) by accounting for the delay from confirmation-to-death [1].

We assumed the delay from confirmation-to-death followed the same distribution as estimated hospitalisation-to-death, based on data from the COVID-19 outbreak in Wuhan, China, between the 17th December 2019 and the 22th January 2020, accounting right-censoring in the data as a result of as-yet-unknown disease outcomes (Figure 1, panels A and B in [7]). The distribution used is a Lognormal fit, has a mean delay of 13 days and a standard deviation of 12.7 days [7].

To correct the CFR, we use the case and death incidence data to estimate the proportion of cases with known outcomes [1,6]:

\[ u_{t} = \frac{ \sum_{j = 0}^{t} c_{t-j} f_j}{c_t}, \]

where \(u_t\) represents the underestimation of the proportion of cases with known outcomes [1,5,6] and is used to scale the value of the cumulative number of cases in the denominator in the calculation of the cCFR, \(c_{t}\) is the daily case incidence at time, \(t\) and \(f_t\) is the proportion of cases with delay of \(t\) between confirmation and death.

Approximating the proportion of symptomatic cases reported

At this stage, raw estimates of the CFR of COVID-19 correcting for delay to outcome, but not under-reporting, have been calculated. These estimates range between 1% and 1.5% [1–3]. We assume a CFR of 1.4% (95% CrI: 1.2-1.7%), taken from a recent large study [3], as a baseline CFR. We use it to approximate the potential level of under-reporting in each country. Specifically, we perform the calculation \(\frac{1.4\%}{\text{cCFR}}\) of each country to estimate an approximate fraction of cases reported.

Temporal variation model fitting

We estimate the level of under-reporting on every day for each country that has had more than ten deaths. We then fit a Gaussian Process (GP) model using the library greta and greta.gp. The parameters we fit and their priors are the following: \[ \begin{aligned} &\sigma \sim \text{Log Normal(-1, 1)}: \quad &\text{Variance of the reporting kernel} \\ &\text{L} \sim \text{Log Normal(4, 0.5)}: \quad &\text{Lengthscale of the reporting kernel} \\ &\sigma_{\text{obs}} \sim \text{Truncated Normal(0, 0.5)}, \quad &\text{Variance of the obseration kernel, truncated at 0} \end{aligned} \] The kernel is split into two components: the reporting kernel \(R\), and the observation kernel \(O\). The reporting component has a standard squared-exponential form. For the observation component, we use an i.i.d. noise kernel to acccount for observation overdispersion, which can smooth out overly clumped death time-series. This is important as some countries have been known to report an unusually large number of deaths on a single day, due to past under-reporting.

In the sampling and fitting process, we calculate the expected number of deaths at each time-point, given the baseline CFR. We then use a Poisson likelihood, where the expected number of deaths is the rate of the Poisson likelihood, given the observed number of deaths

Adjusting case counts for under-reporting

We adjust the reported number of cases each day, pulled from the ECDC. Specifically, we divide the case numbers of each day by our “proportion of cases reported” estimates that we calculate each day for each country.*

Limitations

Implicit in assuming that the under-reporting is \(\frac{1.4\%}{\text{cCFR}}\) for a given country is that the deviation away from the assumed 1.4% CFR is entirely down to under-reporting. In reality, burden on healthcare system is a likely contributing factor to higher than 1.4% CFR estimates, along with many other country specific factors.

The following is a list of the other prominent assumptions made in our analysis:

Code and data availability

The code is publically available at https://github.com/thimotei/CFR_calculation. The data required for this analysis is a time-series for both cases and deaths, along with the corresponding delay distribution. We scrape this data from ECDC, using the NCoVUtils package [8].

Acknowledgements

The authors, on behalf of the Centre for the Mathematical Modelling of Infectious Diseases (CMMID) COVID-19 working group, wish to thank DSTL for providing the High Performance Computing facilities and associated expertise that has enabled these models to be prepared, run and processed and in an appropriately-rapid and highly efficient manner.

References

1 Russell TW, Hellewell J, Jarvis CI et al. Estimating the infection and case fatality ratio for covid-19 using age-adjusted data from the outbreak on the diamond princess cruise ship. medRxiv 2020.

2 Verity R, Okell LC, Dorigatti I et al. Estimates of the severity of covid-19 disease. medRxiv 2020.

3 Guan W-j, Ni Z-y, Hu Y et al. Clinical characteristics of coronavirus disease 2019 in china. New England Journal of Medicine 2020.

4 Shim E, Mizumoto K, Choi W et al. Estimating the risk of covid-19 death during the course of the outbreak in korea, february-march, 2020. medRxiv 2020.

5 Kucharski AJ, Edmunds WJ. Case fatality rate for ebola virus disease in west africa. The Lancet 2014;384:1260.

6 Nishiura H, Klinkenberg D, Roberts M et al. Early epidemiological assessment of the virulence of emerging infectious diseases: A case study of an influenza pandemic. PLoS One 2009;4.

7 Linton NM, Kobayashi T, Yang Y et al. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data. Journal of Clinical Medicine 2020;9:538.

8 Abbott S MJ Hellewell J. NCoVUtils: Utility functions for the 2019-ncov outbreak. doi:105281/zenodo3635417 2020.